Anytime analysis of moea-benchmark data
Reproducing the plots and tables from the ECJ paper that consider outputs over a range of maximum function evaluations.
This notebook has been rendered as an HTML page for your navigation. Yet, the notebook is also available for cloning, or to be executed online using Binder or Colab.
Below, you will find the figures and tables from the ECJ paper that consider only outputs outputs over a range of maximum function evaluations. ($\textit{FE}_\textit{max}$), which we dub anytime analysis.
In the paper, we mostly focused on selected scenarios, for brevity. In this notebook, results are first presented as in the paper, and then provided for more experimental scenarios, when possible.
Finally, we remark that this first version of the notebook does not include Section 7 plots.
Setup
The data for anytime analysis is provided in the original moea-benchmark repository, and can be read using the pandas data science library for Python.
import pandas as pd
df_anytime = pd.read_csv("https://github.com/leobezerra/moea-benchmark/raw/master/anytime.csv.gz")
In the data above, setup indicates whether settings used are default or tuned. In the latter case, config indicates for which $\textit{FE}_\textit{max}$ value the settings were configured.
Besides pandas, we will also use the Plotly interactive data visualization library.
import re
import plotly.express as px
import plotly.graph_objects as go
We make three adjustments to the data prior to plotting.
- To improve clarity, we fill the missing
configvalues withdefault. - Since the data was produced using 25 different seeds, we compute the mean of the runs.
- We index the data by $\textit{FE}_\textit{max}$, which greatly increases the memory usage, but is a requirement for plotting time series data with Plotly.
df_anytime["config"] = df_anytime["config"].fillna('default')
df_anytime_mean = df_anytime.groupby(["setup", "config", "algo", "indicator", "nobj", "problem", "nvar"]).mean()
df_anytime_mean = df_anytime_mean.drop(columns=["seed"])
ts_anytime = df_anytime_mean.stack().reset_index(
["setup", "config", "algo", "indicator", "nobj", "problem", "nvar"],
name="value"
)
ts_anytime.index = ts_anytime.index.astype("int")
Section 5: Preliminary analysis
In this notebook, we focus on figures and tables that use only anytime analysis, namely Figures 2 and 4.
The remainder figures of this section are provided in the snapshot analysis notebook.
Figure 2 depicts the evolution of the $\textit{HV}_\textit{rd}$ performance of IBEA using DE as underlying EA with different numerical parameter settings, on a given experimental scenario.
A few resources from Plotly can be useful for navigation:
- selecting a subset of the settings, by clicking on their names in the legend
- zooming into a given range of the plot, by selecting an area of the plot
ts_ibea_WFG8_3_30 = ts_anytime.query("algo == 'ibea' and problem == 'WFG8' and nobj == 3 and nvar == 30")\
.sort_index()
fig2 = px.line(
ts_ibea_WFG8_3_30,
y="value",
x=ts_ibea_WFG8_3_30.index,
color="config",
line_dash="config",
)
Alternatively, we also provide code to produce the full set of plots from the data produced in this IBEA experiment.
ts_ibea = ts_anytime.query("algo == 'ibea'")
fig2_full = px.line(
ts_ibea,
y="value",
x=ts_ibea.index,
color="config",
line_dash="config",
animation_frame="problem",
facet_col="nvar",
facet_col_wrap=3,
range_y=(0,0.4)
)
We remark that, for some problems, IBEA might perform better at a given $\textit{FE}_\textit{max}$ snapshot using a setting configure for a different $\textit{FE}_\textit{max}$.
Yet, when we compute rank sums using the results from the values used for configuration, we see that each setting is the most adequate choice for its corresponding setup.
df_ibea_snapshots = ts_ibea.loc[[2500,10000,40000]].reset_index().rename(columns={"index": "FE"})
rs_ibea_snapshots = df_ibea_snapshots.drop(columns=["setup", "algo", "indicator"])\
.pivot_table(index=["problem", "nobj", "nvar", "FE"], columns=["config"])\
.rank(axis=1).groupby("FE").sum()
Figure 2 depicts the evolution of the $\textit{HV}_\textit{rd}$ performance of different MOEAs using parameter settings tuned for a common stopping criterion ($\textit{FE}_\textit{max} = 10,000$) on a given experimental scenario.
ts_10k_WFG3_3_50 = ts_anytime.query("config == 10000 and problem == 'WFG3' and nobj == 3 and nvar == 50")\
.sort_index()
fig4 = px.line(
ts_10k_WFG3_3_50,
y="value",
x=ts_10k_WFG3_3_50.index,
color="algo",
line_dash="algo",
)
Alternatively, we also provide code to produce the full set of plots from the data produced in this experiment.
ts_10k = ts_anytime.query("config == 10000")
fig4_full = px.line(
ts_10k,
y="value",
x=ts_10k.index,
color="algo",
line_dash="algo",
animation_frame="problem",
facet_col="nvar",
facet_col_wrap=3,
range_y=(0,0.4)
)
for k in fig4_full.layout:
if re.search('yaxis[1-9]+', k):
fig4_full.layout[k].update(matches=None)
Note that the erratic behavior of NSGA-II has been extensively reported in the literature, being a consequence of its environmental replacement strategy.